1 - Spatial Data Science

If you think that spatial data is just a set of longitude and latitude observations for a given area of geographic space, and that it should be treated as such regardless, you’re missing out on opportunities and have a high chances of incurring in meaningless analysis. Spatial data is not, in fact, just a set of records of observable events and/or facts in geographical space. They are the (potential) starting point for answering relevant questions about various problems in a given reality.

In the New Era of Big Data, where decision-making is increasingly data-driven, Spatial Data Science has emerged as an essential field, with applications spanning many sectors of society (e.g. agriculture, environmental science, urban planning, clean energy generation, sanitation, water management, and much more.

For these and many other reasons, I challenge you to reflect on your convictions about Spatial Data, this important entity in the context of Big Data. A lot of things has changed, is changing and will change faster and faster, especially due to the advance of visual and computational technologies. No matter what your area of interest in this field, these changes presuppose, and to some extent demand, that you develop new skills.

1.1 -R for Spatial Data Science

At some point in your life you’ve heard someone say that “R is a powerful tool”. This is true, and you don’t need to make a big effort to find n reasons why it is. WWhen I wrote this notebook (using R Markdown), I did not it thinking about explaining what R is and how it works. The purpose of this notebook goes beyond this part, which I assume you already know. After all, my goal with this notebook is to demonstrate, with one possibility, how you can use R to start your studies in Spatial Data Science.

2 - Hands on

2.1 - Interactive Maps [Point]

In this section, I start using spatial data about the occurrence (point geometries) of land degraded by erosion processes in natural and antropogenic ecosystems in a watershed located in the Brazilian Semiarid Region. Specifically, the data was collected and organized by me, after successive efforts in various field works in the state of Bahia, between the year 2022 and 2024.

Land degradation is understood as the reduction of soil capacity to generate assets and services, in qualitative and quantitative terms, due to the decline of its productive potential and of its environmental regulation capacity” (LAL, 2012). From this perspective, land degradation is a complex process, associated with natural factors and human activities. According to Silva (2023), the land degradation is closely related to landscape degradation, considering the soil-landscape relationship.

On figure below I present a record (21-06-2023/17:06pm), organized by me, of an area highly degraded by an erosive process (Ravina), with a strong impact in landscape.

To reproduce this document, you need to have R installed on your machine, as well RStudio. RStudio is an Integrated Development Environment (IDE) for R.

In the chunk below you will set your Working Directory. To do this you use setwd( ) and pass the Pass the address.

setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")

In the chunk below you will install the necessary packages. As I already have these packages installed, note that I left the command as a comment, adding the #. If you don’t have these packages installed, just remove the # and run. To install a package you use install.packages( ).

The first package to install is leafleat, a visualization tool that is highly recommended for those who want Interactive Maps.

The second package to install is sf (Simple Features) package, a very useful feature for those who work with maps, very interesting for representing and working with spatial vector data including points, polygons, and lines.

#install.packages("leafleat")
#install.packages("sf")

Once installed, in the chunk below you will load the installed packages. All you have to do is use library( ).

library(leaflet)
library(sf)

Here, in the chunk below, you will read the spatial data, in this case, a point geometries (shp.). Note that it will be stored in a variable named shape.

shape <- st_read('D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/LandDegradation.shp')
## Reading layer `LandDegradation' from data source 
##   `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\LandDegradation.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 2146 features and 12 fields
## Geometry type: POINT
## Dimension:     XY
## Bounding box:  xmin: -40.69852 ymin: -11.37433 xmax: -40.67941 ymax: -11.34879
## Geodetic CRS:  WGS 84

In the chunk above, you can observe the properties of the data. The printed output shows:

  • the (selected) dataset has 2146 features (records) and 12 fields (attributes);

  • the geometry type is MULTIPOLYGON;

  • it has dimension XY, indicating that each point will consist of 2 coordinate values

  • the range of X and Y values of the geometry;

  • The Bounding box;

  • the coordinate reference system (CRS) is geodetic, with coordinates in degrees longitude and latitude.

In the chunk below you can create your first interactive map using the leaflet package. In this situation, your interactive map show the information of land degradation occurrence. Note that, the infomation is presented by cluster. If you zoom in, you’ll see that the clusters are gradually dissolving. This is a very interesting approach when you have a large number of points that can visually pollute the map.

  leaflet(shape) %>%
  setView(lat = -11.351051, lng = -40.684630, zoom = 16) %>% 
  addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery") %>%
  addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>%
  addLayersControl(baseGroups = c("ESRI World Imagery", "Open Street Map"),
  options = layersControlOptions(collapsed = TRUE)) %>%
  addMarkers(clusterOptions = markerClusterOptions(),
             label = shape$Tipo,
             labelOptions = labelOptions(
             textsize = "14px", 
             style = list(
               "font-weight" = "bold", 
             padding = "5px", 
             lat = ~Lat, lng = ~Long)))

In the chunk below you can create your second interactive map using the leaflet package. Note that this time the information is categorized by the Tipo field (Type of Erosion), and is also associated with colors (Yellow, Orange and Red) that reflect the severity of the process.

Tipo <- c("Boçoroca", "Ravina", "Sulcos")

my_palette <- c("red", "orange", "yellow")

#previewColors(colorFactor(my_palette, levels = Tipo), Tipo)

factpal <- colorFactor(my_palette, levels = Tipo)

groups = unique(shape$Tipo)

map = leaflet(shape) %>% 
  setView(lat = -11.351051, lng = -40.684630, zoom = 17.3) %>% 
  addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery")
  for (i in groups) {
    data = shape[shape$Tipo == i, ]
    map = map %>% addCircleMarkers(data = data, radius = 4, 
        weight = 2, opacity = 1, fill = TRUE, fillOpacity = 0.2, color = ~factpal(Tipo), group = i)}
map %>% addLayersControl(overlayGroups = groups, options = layersControlOptions(collapsed = FALSE))

A few background questions:

  • What information can you get from looking at this data?
  • Can this tool be useful in your workplace?
  • What difference does it make if you implement this Interactive Maps feature in your reports?

2.2 - Interactive Maps [Polygon]

In this section, I will use spatial data (Polygon geometries) about the population density in a neighborhood of Formosa City, Goiás, Brazil. Specifically, the raw data describing the number of inhabitants at the census sector scale was collected and made available digitally by the Brazilian Institute of Geography and Statistics (IBGE). It was therefore up to me to acquire, organize and process them (I did it at an previous stage using resources of the GIS/QGIs environment) in order to arrive at the population density.

Population density is very useful information in a variety of situations involving decision-making in cities, but not only E. g., population density information may be relevant for identifying, mapping and monitoring sectors where there is a high risk of COVID-19 infection.

On figure below I present a record (13-02-2024/14:19pm), organized by me, of an street located in the Formosinha neighborhood, in the city of Formosa, Goiás, Brazil. Looking at the context (presence of buildings, arrangement of the houses…), you have indications that this is a sector where there is a considerable population density.

library(leaflet)
library(sf)
library(RColorBrewer)
library(classInt)

Here, in the chunk below, you will read the spatial data, in this case, a polygon geometries (shp.). Note that it will be stored in a variable named Population.

Population <- st_read('D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/PopulationDensity.shp')
## Reading layer `PopulationDensity' from data source 
##   `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\PopulationDensity.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 21 features and 4 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: -47.33796 ymin: -15.56236 xmax: -47.32025 ymax: -15.54231
## Geodetic CRS:  WGS 84

If your data is in another SRC, on chunk below you can transform to leaflet projection. In this particular case, this does not apply because the data being read is already in the correct SRC.

#Population <- st_transform(Population, crs = '+proj=longlat +datum=WGS84')

Once this is understood, with the chunk below command you can plot the interactive map.

leaflet() %>%
  addProviderTiles(providers$Esri.WorldImagery, group = "Esri.WorldImagery") %>%
  addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>%
  addLayersControl(baseGroups = c("ESRI World Imagery", "Open Street Map"),
  options = layersControlOptions(collapsed = TRUE)) %>%
  setView(lng = -47.33, lat = -15.55, zoom = 14) %>%
  addMiniMap(width = 150, height = 150) %>%
  addPolygons(data = Population, color = "black", stroke = 1, opacity = 0.8) 

With the chunk command below, set the parameters to create a map categorized by the population density of the neighborhood.

pal_fun <- colorQuantile("YlOrRd", NULL, n = 5)
p_density <- paste0("<strong>Densid_Ha: </strong>", Population$Densid_Ha)
breaks_qt <- classIntervals(c(min(Population$Densid_Ha) - .00001, Population$Densid_Ha), n = 5, style = "jenks")

Considering the defined parameters, here you can plot and see the interactive map:

leaflet(Population) %>%
  addPolygons(
    stroke = TRUE, 
    color = "black",
    fillColor = ~pal_fun(Densid_Ha),
    fillOpacity = 0.8, smoothFactor = 0.5,
    popup = p_density,
    group = "Population") %>%
  addTiles(group = "OSM") %>%
  addProviderTiles("Esri.WorldImagery", group = "ESRI World Imagery") %>%
  setView(lng = -47.33, lat = -15.55, zoom = 14) %>%
  addLegend("bottomright", 
            colors = brewer.pal(5, "YlOrRd"), 
            labels = paste0("Up to", format(breaks_qt$brks[-1], digits = 2)),
            title = 'Population Density (h/ha)') %>% addLayersControl(baseGroups = c("OSM", "ESRI World Imagery"), 
                   overlayGroups = c("Population"))  

As obvious as it may seem, when working with spatial data, ALWAYS REMEMBER: data can’t explain itself, because it can’t talk; the map doesn’t show/indicate/suggest, because the map doesn’t have a life of its own. YOU are expected to do this, and what’s more, YOU are expected to see beyond the data. The limited set of data we are working with is not always enough to answer our questions, it is up to you to know the limits, the power of generalization of the information.

Thank you!

2.3 - Interactive Maps [Raster]

Raster are a particular form of data on Spatial Data Science.

In this section, I will use spatial data (Raster) on Cerrado Remnants in the Ribeirão João Leite Watershed, located in the Metropolitan Region of Goiânia. This data is authored by Silva and Nascimento (2024). In this paper, authors demonstrate the potential of Earth Observation data (obtained by Orbital Remote Sensing) for mapping Cerrado Remnants, providing crucial information for Environmental Planning and Integrated Territorial Management at the watershed scale.

The Brazilian Cerrado Remnants are one of the critical points for its conservation (MYERS et al., 2000). The total or partial disappearance of the continuous and natural landscapes of the Brazilian Cerrado is gradually affecting its biodiversity and important ecosystem services, such as the regulation of the carbon balance and the availability of water (FRANÇOSO et al., 2021; SANTOS et al., 2021). Cerrado Remnants play a crucial role in providing ecosystem services, such as regulating the climate, reducing surface runoff, helping to control soil erosion, protecting and contributing to the balance of aquatic ecosystems, and others (HOFMANN et al., 2021; HOFMANN et al., 2023. Despite this, studies show that in 2022 the suppression of natural vegetation cover in the Brazilian Cerrado increased by 25% compared to 2021, indicating that more than 50% of its remnants have already been suppressed (BARBOSA, 2011; MESSIAS et al., 2022).

Day after day, official data record that the continuous landscapes of natural vegetation cover in the Brazilian Cerrado continue to be intensely fragmented and are increasingly limited to fragments that are not only isolated from each other, but also pressured by the anthropogenic activities that take place around them (MOTA JUNIOR, SILVA and SILVA, 2021; SILVA, SILVA and SOUZA, 2022; MOTA JUNIOR, TRENTIN and SILVA 2023; CHAVES et al., 2023). Assessing the environmental quality of these Cerrado remnants is therefore vital to understanding their current condition and providing support for their conservation (LONGO et al., 2024; OLIVEIRA, SILVA and NASCIMENTO, 2024).

I leave you with an question:

what are the challenges and opportunities, today, for the conservation of Cerrado Remnants in the João Leite watershed, one of the main sources of water for supplying the population of the Goiânia Metropolitan Region?

In the chunk below you will set your Working Directory. To do this you use setwd( ) and pass the Pass the address.

setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")

In the chunk below you will install the necessary packages. Specifically, the library(sf) will make it possible to load and read the vector data (Polygon) relating to the boundaries of the Ribeirão João Leite Watershed. On the other hand, the library(raster) will allow you to load and read the matrix data (Raster) referring to the Cerrado Remnants that were mapped by Silva and Nascimento (2024).

The library(raster) is fundamental for manipulating and analyzing spatial raster data in R. It provides an efficient data structure for dealing with large sets of raster data, such as satellite images, digital elevation models (DEM), climate data, and more. Finally, library(mapview) will provide us with an interactive interface for visualizing the spatial data in question.

Maybe library(mapview) is a ‘little more’ practical than library(leaflet). What do you think? The fact is that library(mapview) makes it significantly easier to create interactive and exploratory maps directly in the environment R Markdown. That’s what I think, anyway.

If you don’t have the libraries installed, to reproduce this notebook you need to install them. As I already have installed, I’ll just load them in the chunk below.

library(sf)
library(raster)
library(mapview)

In the chunk below you execute the following routine:

  • Load and read the vector data (Polygon) referring to the boundaries of the Ribeirão João Leite Watershed.

  • Create an Interactive Map to the boundaries of the Ribeirão João Leite Watershed.

Watershed_JL <- "D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/SpatialData/Watershed_João_Leite_ANA.shp"
Watershed_JL <- st_read(Watershed_JL)
## Reading layer `Watershed_João_Leite_ANA' from data source 
##   `D:\Documents\Music\PROJECTSR\SpatialDataScienceinR\SpatialData\Watershed_João_Leite_ANA.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 8 features and 9 fields
## Geometry type: POLYGON
## Dimension:     XY
## Bounding box:  xmin: 685256.6 ymin: 8158855 xmax: 718330 ymax: 8206917
## Projected CRS: SIRGAS 2000 / UTM zone 22S
Map_Watershed_JL <- mapview(Watershed_JL, alpha.regions = 0, lwd = 3, legend = FALSE)
Map_Watershed_JL

In the chunk below you execute the following routine:

  • Load and read the array data (Raster) referring to the Cerrado Remnants in the Ribeirão João Leite Watershed.

  • Overlay the first map (boundaries of the Ribeirão João Leite Watershed) onto the second (Cerrado Remnants in the Ribeirão João Leite Watershed).

LULC_2020 <- "D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/raster/CerradoRemnants06072023S2JL.tif"
LULC_2020 <- raster(LULC_2020)
Map_LULC_2020 <- mapview(LULC_2020, na.color = "transparent", col.regions = "#008000", legend = FALSE)
Map_Watershed_JL + Map_LULC_2020

Thank you.

3 - LULC

Net and Gross gain and loss is a very important metric to understand transitions events on LULC. In this section…

The acronyms FF, SF, CF, PL, Wt and CL represent the Forest Formations, Savannah Formations, Countryside Formations, Pastureland, Water and Cropland, respectively.

4 - Climatic Data

5 - Charts

5.1 - Correlogram

A Correlogram is a correlation matrix plot where you can visualize the correlation coefficients between n variables. Correlograms are useful in Exploratory Data Analysis (EDA) and in understanding relationships between variables. They can help identify potential patterns or dependencies in the data, which is crucial for making informed decisions in fields like, Geography, Economics, Sociology, Biology, and many others where understanding relationships between variables is important.

In this section, I will use spatial data to calculate and visualize a correlation matrix. The data that will be used was obtained using different methods, and falls within the spatial context of the largest fragment of natural vegetation cover in the city of Formosa, Goiás, Brazil: the Mata da Bica Municipal Park (MBMP).

In this notebook I will use three libraries, library(leaflet), library(readxl) and library(corrplot). The first will generate an interactive map of the spatial context of Mata da Bica Municipal Park, in the city of Formosa, Goiás, Brazil. The second allows you to read the data in the format that is stored in an .xlsx spreadsheet; and the third allows you to generate the correlogram. If you don’t have the libraries installed, to reproduce this notebook you need to install them.

Once installed, in the chunk below you will load the installed packages.

library(leaflet)
library(readxl)
library(corrplot)
## corrplot 0.92 loaded

Using the command below, you can plot the interactive map to visualize Mata da Bica Municipal Park, the largest forest fragment of remaining natural vegetation cover still existing in the city of Formosa, Goiás, Brazil.

MBMP_cord <- c(-47.336470147649806, -15.553588713481723)

MBMP <- leaflet() %>%
  setView(lng = MBMP_cord[1], lat = MBMP_cord[2], zoom = 16.3) %>% 
  addProviderTiles(providers$Esri.WorldImagery, group = "ESRI World Imagery") %>% 
  addProviderTiles(providers$OpenStreetMap, group = "Open Street Map") %>% 
  addLayersControl(
    baseGroups = c("ESRI World Imagery","Open Street Map"),
    options = layersControlOptions(collapsed = FALSE)
    )
MBMP

If you don’t know Mata da Bica Municipal Park, I realy recommend that you get to know this important Urban Green Space for Recreation. In the picture below I present a record (13-02-2024/10:49am), organized by me, of a characteristic landscape on Mata da Bica Municipal Park. In my opinion, an encouraging landscape for a walk around the lake.

As I said, the data that will be used was obtained using different methods:

Enhanced Vegetation Index (EVI): obtained from the digital processing of Sentinel-2A images.

Green Normalized Difference Vegetation Index (GNDVI): obtained from the digital processing of Sentinel-2A images.

Moisture Stress Index (MSI): obtained from the digital processing of Sentinel-2A images.

Number of Days with Precipitation (NDP): obtained from records measured on site by the National Meteorological Institute, Conventional Station Nº. 83379.

Normalized Difference Water Index (NDWI): obtained from the digital processing of Sentinel-2A images.

Monthly Precipitation (P): obtained from records measured on site by the National Meteorological Institute, Conventional Station Nº. 83379.

Total Insolation (TI): obtained from records measured on site by the National Meteorological Institute, Conventional Station Nº. 83379.

Mean Relative Air Humidity (RAHm): obtained from records measured on site by the National Meteorological Institute, Conventional Station Nº. 83379.

In the chunk below you will set your Working Directory.

setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")

In the chunk below you will read the data.

Data <- read_excel("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/XLSX/DF_1.xlsx", 
                         sheet = "df", col_types = c("numeric", "numeric", "numeric", "numeric", 
                                                     "numeric", "numeric", "numeric", "numeric"))

In the chunk below, you will calculate the correlation matrix for the n (in this case 8) variables already highlighted (EVI, GNDVI, MSI, NDP, NDWI, P, RAHm, TI).

cor_matrix = cor(Data) 

In the chunk below, you will plot the correlogram.

corrplot(cor_matrix, 
         type = "lower",  
         method = "square",  
         tl.col = "black", 
         bg = "White",
         addCoef.col = "black",
         tl.srt = 45) 

It is so important to mention that, this is not the only way to achieve this result, of course. You can modify the aesthetics of the correlogram in different ways. E. g.: if you just change 'method=square' to 'method=circle', you will get a graph with a new profile.

Two background question for you:

What relevant information can you extract from this graph?

Assuming that the data is indeed reliable, how do you explain such a high (0.99) negative correlation between the MSI and NDWI variables?

Thank you.

6 - SIDRA

6.1 - Barplot using SIDRA data (case 1)

The acronym “SIDRA” refers to the Institute of Geography and Statistics (IBGE) Automatic Retrieval System, which is a platform developed to access and query statistical data produced by the institute itself. SIDRA provides access to a wide range of statistical information on various topics such as population, economy, agriculture, industry, trade, and much more.

SIDRA is widely used by researchers, academics, professionals, and the general public interested in statistical data about Brazil. It is a fundamental tool for academic studies, economic analysis, public policy planning, and business decision-making. In summary, SIDRA is an essential platform for accessing and exploring comprehensive and updated statistical data provided by the IBGE.

In this section, I will use data provided by SIDRA. The purpose of this notebook is to demonstrate, as one possibility, how to import and read a spreadsheet (.csv) downloaded from SIDRA, perform data preprocessing and processing, and finally plot a bar chart.

The data to be used were downloaded directly from SIDRA, within the scope of the Municipal Livestock Survey 2022 (PPM), which aims to provide statistical information on livestock herds, shorn sheep, milked cows, animal products, and aquaculture production. In this particular case, the survey is annual and covers the entire national territory, with information for Brazil, Geographic Regions, Federative Units, Geographic Mesoregions, Geographic Microregions, and Municipalities, currently covering the historical series from 1974 to 2022.

Considering the PPM-Tables, specifically, the data I use provide information on animal products by type. The animal product type I am using is the quantity of Chicken Eggs (Thousand dozens), with the territorial units being Brazil and the state of Goiás (1974-2022).

Eggs are one of the most consumed foods in the world. In many countries, eggs are an essential component of families’ diets due to their high nutritional value. In recent years, it has been observed that egg consumption has been gradually increasing in Brazil. Considering the data provided by IBGE, let’s observe the dynamics of egg production in Brazil and, particularly, in the state of Goiás, an important player in animal product production.

In the chunk below you will set your Working Directory.

setwd("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR")

In this notebook I will use just one library: ggplot2. If you don’t have this library installed, to reproduce this notebook you need to install.

Once installed, in the chunk below you will load it.

library(ggplot2)

In the chunk below you will read the data (ignoring the first three lines, reading the next three lines).

SIDRA_Data <- read.csv2("D:/Documents/Music/PROJECTSR/SpatialDataScienceinR/CSV/avicultura_br_and_go_mil_duzia.csv", skip = 3, nrows = 3)

In the chunk below, you will visualize SIDRA_Data.

View(SIDRA_Data)

I need to remove the first row (Chicken eggs;Chicken eggs;Chicken eggs;Eggs….) and the first column (Brazil.and.Unit.of.Federation).

SIDRA_Data <- SIDRA_Data[-1, -1]

In the chunk below, you can visualize SIDRA_Data after removing the fields highlighted above.

View(SIDRA_Data)

Now you need to remove the string “X” from the column headings (X1974, X1975, X1976…). You can do it:

colnames(SIDRA_Data) <- gsub("X", "", colnames(SIDRA_Data))

In the chunk below, you need to create a vector and store the year’s information as an integer. You can do it:

Years <- as.integer(colnames(SIDRA_Data))

In the chunk below, you need to reproduce the routine presented above for the egg production values for the case of Brazil. However, unlike the previous case, you need to save the values as numerical. You can do it:

Eggs_Production_Brasil <- as.numeric(as.character(unlist(SIDRA_Data[1,], use.names = FALSE)))

In the chunk below, you need to reproduce the routine presented above for the egg production values for the state of Goiás, in the same way as for Brazil, saving the values as numeric. You can do it:

Eggs_Production_Goias <- as.numeric(as.character(unlist(SIDRA_Data[2,], use.names = FALSE)))

Now you can access this data and create a new DataFrame(DF). You can do it:

SIDRA_DF <- data.frame(Eggs_Production_Brasil, Eggs_Production_Goias, Years)

In the chunk below, you can visualize the new DF.

View(SIDRA_DF)

In the chunk below, you can plot a barplot of Brazil’s egg production.

ggplot(SIDRA_DF, aes(Years, Eggs_Production_Brasil)) +
  geom_bar(stat = "identity", position = 'dodge') +
  labs(x= "Years", y= "Eggs Production (Mil duzias)") +
  theme_minimal()

Note that the way the values on the Y axis are presented, which refer to egg production in Brazil (a thousand dozen), makes it difficult to understand the data. The graph needs to be organized in such a way as to make it easy to read, so that the information can be extracted without any further effort being required. So, I’m going to convert the values from thousands to millions (of dozen eggs) by dividing the value of each record by 1,000,000.

In the chunk below, I will define the parameters for customizing the barplot of Brazil’s egg production. I need a plot that looks a bit better.

windowsFonts(TimesNewRoman = windowsFont("Times New Roman"))

tema <-theme(text=element_text(family="TimesNewRoman"),
            title=element_text(color="black",size=12),
            axis.text = element_text(color="black",size=13),
            axis.title = element_text(color="black",size=13),
            panel.grid=element_line(color="gray",linetype = "dotted",size=0.5),
            axis.line=element_blank(),
            plot.background=element_rect(fill="white",color="white"),
            panel.background=element_rect(fill="white"),
            panel.border = element_rect(colour = "black", fill = NA,size=0.59),
            legend.key= element_rect(color="white",fill="white"))

In the chunk below, you can plot the historical series of egg production in Brazil (millions of dozen) also taking into account the customization parameters already defined.

ggplot(SIDRA_DF, aes(Years, Eggs_Production_Brasil/1000000)) +
  geom_bar(stat = "identity", position = 'dodge') +
  labs(title="Egg production (millions of dozen) in Brazil: 1974-2022", x= "Years", y= "Eggs Production (millions of dozen)") +
  tema

In fact, we can see from the data provided by the IBGE that egg production in Brazil is following a trend of exponential growth, since 2010.

An background question for you:

Who eats so many eggs?

In the chunk below, you can plot the historical series of egg production in Goias (thousands of dozen) also taking into account the customization parameters already defined.

ggplot(SIDRA_DF, aes(Years, Eggs_Production_Goias/1000)) +
  geom_bar(stat = "identity", position = 'dodge') +
  labs(title="Egg production (thousands of dozen) in Goias state: 1974-2022", x= "Years", y= "Eggs Production (thousands of dozen)") +
  tema

As has been observed throughout Brazil, we can also see a trend towards exponential growth in the state of Goiás.

Background questions for you:

Why are so many eggs produced in Goiás? why is production increasing? and who consumes so many eggs?

Thank you!

7 - Google Earth Engine

8 - Brazil Data Cube


  1. If you like it and want to contribute: pix: izaiasdesouzasilvaa@gmail.com↩︎